Retrieving Syntactically Annotated Corpora using a Relational Database
نویسندگان
چکیده
منابع مشابه
Syntactically annotated corpora of Estonian
Syntactically annotated corpora are needed 1) to train and test parsers and various language technological products grammar checkers, information retrievers and extractors, machine translators etc; 2) to check the agreement of existing linguistic theories with the real language usage. The corpora can be annotated on different levels of depth. In shallow syntactically annotated corpora a syntact...
متن کاملA Description Language for Syntactically Annotated Corpora
This paper introduces a description language for syntactically annotated corpora which allows for encoding both the syntactic annotation to a corpus and the queries to a syntactically annotated corpus. In terms of descriptive adequacy and computational efficiency, the description language is a compromise between script-like corpus query languages and high-level, typed unification-based grammar ...
متن کاملMining Syntactically Annotated Corpora with XQuery
This paper presents a uniform approach to data extraction from syntactically annotated corpora encoded in XML. XQuery, which incorporates XPath, has been designed as a query language for XML. The combination of XPath and XQuery offers flexibility and expressive power, while corpus specific functions can be added to reduce the complexity of individual extraction tasks. We illustrate our approach...
متن کاملSystem for Querying Syntactically Annotated Corpora
This paper presents a system for querying treebanks. The system consists of a powerful query language with natural support for cross-layer queries, a client interface with a graphical query builder and visualizer of the results, a command-line client interface, and two substitutable query engines: a very efficient engine using a relational database (suitable for large static data), and a slower...
متن کاملExtracting Collocations from syntactically annotated biomedical Corpora
This thesis investigates the extraction of frequently used phrases (so called collocations) from biomedical text sources. The extraction of uninterrupted collocation candidates is introduced. For interrupted candidates, with gaps between their subcomponents, a new technique using suffix tries is developed. It is based on the iterative extension of frequent smaller patterns. This reduces computa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Natural Language Processing
سال: 2007
ISSN: 1340-7619,2185-8314
DOI: 10.5715/jnlp.14.4_3